Compounds in dictionary-based cross-language information retrieval

نویسنده

  • Turid Hedlund
چکیده

Compound words form an important part of natural language. From the cross-lingual information retrieval (CLIR) point of view it is important that many natural languages are highly productive with compounds, and translation resources cannot include entries for all compounds. Also, compounds are often content bearing words in a sentence. In Swedish, German and Finnish roughly one tenth of the words in a text prepared for information retrieval purposes are compounds. Important research questions concerning compound handling in dictionary-based cross-language information retrieval are 1) compound splitting into components, 2) normalisation of components, 3) translation of components and 4) query structuring for compounds and their components in the target language. The impact of compound processing on the performance of the cross-language information retrieval process is evaluated in this study and the results indicate that the effect is clearly positive.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clef Experiments at Maryland: Statistical Stemming and Backoo Translation

The University of Maryland participated in the CLEF 2000 multilingual task, submitting three oocial runs that explored the impact of applying language-independent stemming techniques to dictionary-based cross-language information retrieval. The paper begins by describing a cross-language information retrieval architecture based on balanced document translation. A four-stage backoo strategy for ...

متن کامل

Improved Cross-Language Retrieval using Backoff Translation

The limited coverage of available translation lexicons can pose a serious challenge in some cross-language information retrieval applications. We present two techniques for combining evidence from dictionary-based and corpus-based translation lexicons, and show that backoff translation outperforms a technique based on merging lexicons.

متن کامل

CLEF Experiments at the University of Maryland: Statistical Stemming and Back-off Translation Strategies

The University of Maryland participated in the CLEF 2000 multilingual task, submitting three o cial runs that explored the impact of applying language-independent stemming techniques to dictionary-based cross-language information retrieval. The paper begins by describing a cross-language information retrieval architecture based on balanced document translation. A four-stage backo strategy for i...

متن کامل

Arabic/English Cross Language Information Retrieval Using a Bilingual Dictionary

With the increase of multilingual information available online and the increase of non-native English speaker (Arabic users) browsing the Internet, it has become more important to have information retrieval systems that can carry the retrieval process across language boundaries that is, cross language information retrieval CLIR systems. The CLIR system responds to the user query in a comprehens...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Res.

دوره 7  شماره 

صفحات  -

تاریخ انتشار 2002